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Origin and evolution of pathogenic 
coronaviruses 


Severe acute respiratory 
syndrome 

A serious form of pneumonia 
that is characterized by diffuse 
alveolar damage and that has 
the potential to progress to 
acute respiratory distress. 

Type II pneumocytes 

Epithelial cells that line the 
lung alveoli; type II cells are 
round and produce surfactants 
to lower the surface tension of 
water and allow the membrane 
to separate, thereby increasing 
the capability to exchange 
gases. 
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Abstract | Severe acute respiratory syndrome coronavirus (SARS-CoV) and Middle East respiratory 
syndrome coronavirus (MERS-CoV) are two highly transmissible and pathogenic viruses that 
emerged in humans at the beginning of the 21st century. Both viruses likely originated in bats, and 
genetically diverse coronaviruses that are related to SARS-CoV and MERS-CoV were discovered in 
bats worldwide. In this Review, we summarize the current knowledge on the origin and evolution of 
these two pathogenic coronaviruses and discuss their receptor usage; we also highlight the 
diversity and potential of spillover of bat-borne coronaviruses, as evidenced by the recent spillover 
of swine acute diarrhoea syndrome coronavirus (SADS-CoV) to pigs. 


Coronaviruses cause respiratory and intestinal infections 
in animals and humans . They were not considered to 
be highly pathogenic to humans until the outbreak of 
severe acute respiratory syndrome (SARS) in 2002 and 
2003 in Guangdong province, China " 5 , as the coronavi¬ 
ruses that circulated before that time in humans mostly 
caused mild infections in immunocompetent people. Ten 
years after SARS, another highly pathogenic coronavirus, 
Middle East respiratory syndrome coronavirus (MERS- 
CoV) emerged in Middle Eastern countries 6 . SARS 
coronavirus (SARS-CoV) uses angiotensin-converting 
enzyme 2 (ACE2) as a receptor and primarily infects cil¬ 
iated bronchial epithelial cells and type II pneumocytes 7, , 
whereas MERS-CoV uses dipeptidyl peptidase 4 (DPP4; 
also known as CD26) as a receptor and infects unciliated 
bronchial epithelial cells and type II pneumocytes 9-11 . 
SARS-CoV and MERS-CoV were transmitted directly 
to humans from market civets and dromedary camels, 
respectively 2-1 , and both viruses are thought to have orig¬ 
inated in bats 5-21 . Extensive studies of these two important 
coronaviruses have not only led to a better understanding 
of coronavirus biology but have also been driving coro¬ 
navirus discovery in bats globally 1-31 . In this Review, 
we focus on the origin and evolution of SARS-CoV and 
MERS-CoV. Specifically, we emphasize the ecological dis¬ 
tribution, genetic diversity, interspecies transmission and 
potential for pathogenesis of SARS-related coronaviruses 
(SARSr-CoVs) and MERS-related coronaviruses (MERSr- 
CoVs) found in bats, as this information can help prepare 
countermeasures against future spillover and pathogenic 
infections in humans with novel coronaviruses. 

Coronavirus diversity 

Coronaviruses are members of the subfamily Corona- 
virinae in the family Coronaviridae and the order 
Nidovirales (International Committee on Taxonomy 


of Viruses). This subfamily consists of four genera — 
Alphacoronavirus , Betacoronavirus , Gammacoronavirus 
and Deltacoronavirus — on the basis of their phylo¬ 
genetic relationships and genomic structures (FIG. 1). 
The alphacoronaviruses and betacoronaviruses infect 
only mammals. The gammacoronaviruses and deltacoro- 
naviruses infect birds, but some of them can also infect 
mammals 24 . Alphacoronaviruses and betacoronaviruses 
usually cause respiratory illness in humans and gastro¬ 
enteritis in animals. The two highly pathogenic viruses, 
SARS-CoV and MERS-CoV, cause severe respiratory 
syndrome in humans, and the other four human coro¬ 
naviruses (HCoV-NL63, HCoV-229E, HCoV-OC43 and 
HKU1) induce only mild upper respiratory diseases in 
immunocompetent hosts, although some of them 
can cause severe infections in infants, young children 
and elderly individuals 1,28,29 . Alphacoronaviruses and 
betacoronaviruses can pose a heavy disease burden on 
livestock; these viruses include porcine transmissible 
gastroenteritis virus , porcine enteric diarrhoea virus 
(PEDV) and the recently emerged swine acute diar¬ 
rhoea syndrome coronavirus (SADS-CoV) . On the 
basis of current sequence databases, all human corona¬ 
viruses have animal origins: SARS-CoV, MERS-CoV, 
HCoV-NL63 and HCoV-229E are considered to have 
originated in bats; HCoV-OC43 and HKU1 likely orig¬ 
inated from rodents 28,29 . Domestic animals may have 
important roles as intermediate hosts that enable virus 
transmission from natural hosts to humans. In addition, 
domestic animals themselves can suffer disease caused 
by bat-borne or closely related coronaviruses: genomic 
sequences highly similar to PEDV were detected in 
bats 5-3 , and SADS-CoV is a recent spillover from 
bats to pigs (FIG. 2). Currently, 7 of 11 -assigned 
Alphacoronavirus species and 4 of 9 Betacoronavirus spe¬ 
cies were identified only in bats (FIG. 3 ). Thus, bats are 
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Fig. 11 The genomes, genes and proteins of different coronaviruses. Coronavirusesform enveloped and spherical 
particles of 100-160 nm in diameter. They contain a positive-sense, single-stranded RNA (ssRNA) genome of 27-32 kb in 
size. The 5-terminal two-thirds of the genome encodes a polyprotein, pplab, which is further cleaved into 16 non- 
structural proteins that are involved in genome transcription and replication. The 3 1 terminus encodes structural proteins, 
including envelope glycoproteins spike (S), envelope (E), membrane (M) and nucleocapsid (N). In addition to the genes 
encoding structural proteins, there are accessory genes that are species-specific and dispensable for virus replication. 
Here, we compare prototypical and representative strains of four coronavirus genera: feline infectious peritonitis virus 
(FIPV), Rhinolophus bat coronavirus HKU2, severe acute respiratory syndrome coronavirus (SARS-CoV) strains CD02 and 
SZ3 from humans infected during the early phase of the SARS epidemic and from civets, respectively, SARS-CoV strain 
hTor02 from humans infected during the middle and late phases of the SARS epidemic, bat SARS-related coronavirus 
(SARSr-CoV) strain WIV1, Middle East respiratory syndrome coronavirus (MERS-CoV), mouse hepatitis virus (MHV), 
infectious bronchitis virus (IBV) and bulbul coronavirus HKU11. 


likely the major natural reservoirs of alphacoronaviruses 
and betacoronaviruses 4 . 

Animal origin and evolution of SARS-CoV 

At the beginning of the SARS epidemic, almost all early 
index patients had animal exposure before developing 
disease. After the causative agent of SARS was identi¬ 
fied, SARS-CoV and/or anti-SARS-CoV antibodies were 
found in masked palm civets (Paguma larvata) and ani¬ 
mal handlers in a market place 2,16,39 “ 42 . However, later, 
wide-reaching investigations of farmed and wild-caught 
civets revealed that the SARS-CoV strains found in mar¬ 
ket civets were transmitted to them by other animals ,39 . 
In 2005, two teams independently reported the dis¬ 
covery of novel coronaviruses related to human SARS- 
CoV, which were named SARS-CoV-related viruses 
or SARS-like coronaviruses, in horseshoe bats (genus 
Rhinolophus) 4 . These discoveries suggested that bats 
may be the natural hosts for SARS-CoV and that civ¬ 
ets were only intermediate hosts. Subsequently, many 
coronaviruses phylogenetically related to SARS-CoV 
(SARSr-CoVs) were discovered in bats from different 


provinces in China and also from European, African 
and Southeast Asian countries 15,20,38,43-54 (FIG. 4; 
Supplementary Fig. SI a). According to the ICTV criteria, 
only the strains found in Rhinolophus bats in European 
countries, Southeast Asian countries and China are 
SARSr-CoV variants. Those from Hipposideros bats in 
Africa are less closely related to SARS-CoV and should 
be classified as a new coronavirus species 4 . These data 
indicate that SARSr-CoVs have wide geographical spread 
and might have been prevalent in bats for a very long 
time. A 5-year longitudinal study revealed the coexis¬ 
tence of highly diverse SARSr-CoVs in bat populations 
in one cave of Yunnan province, China 8,20,5S . This loca¬ 
tion is a diversity hot spot, and the SARSr-CoVs in this 
location contain all the genetic diversity found in other 
locations of China. Furthermore, the viral strains that 
exist in this one location contain all genetic elements 
that are needed to form SARS-CoV (FIG. 5). As no direct 
progenitor of SARS-CoV was found in bat populations 
despite 15 years of searching and as RNA recombination 
is frequent within coronaviruses , it is highly likely that 
SARS-CoV newly emerged through recombination of 
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bat SARSr-CoVs in this or other yet-to-be-identified bat 
caves. This hypothesis is consistent with previous 
data showing that a direct progenitor of SARS-CoV 
emerged before 2002 (REFS 42,57,58 ). Recombination analy¬ 
sis also strongly supported the hypothesis that the civet 
SARS-CoV strain SZ3 arose through recombination 
of two existing bat strains, WIV16 and Rf4092 (REF. 20 ). 


Genetically diverse 

coronaviruses Natural host Intermediate host Human host 







Spillover to intermediate hosts 
Mild infection 
Severe infection 


Fig. 2 | Animal origins of human coronaviruses. Severe acute respiratory syndrome 
coronavirus (SARS-CoV) is a new coronavirus that emerged through recombination of 
bat SARS-related coronaviruses (SARSr-CoVs) °. The recombined virus infected civets 
and humans and adapted to these hosts before causing the SARS epidemic 42 ’ 62 . 

Middle East respiratory syndrome coronavirus (MERS-CoV) likely spilled over from 
bats to dromedary camels at least 30 years ago and since then has been prevalent 
in dromedary camels. HCoV-229E and HCoV-NL63 usually cause mild infections in 
immunocompetent humans. Progenitors of these viruses have recently been found 
in African bats 33,134 , and the camelids are likely intermediate hosts of HCoV-229E 34135 . 
HCoV-OC43 and HKU1, both of which are also mostly harmless in humans, likely 
originated in rodents. Recently, swine acute diarrhoea syndrome (SADS) emerged in 
piglets. This disease is caused by a novel strain of Rhinolophus bat coronavirus HKU2, 
named SADS coronavirus (SADS-CoV) 1f ; there is no evidence of infection in humans. 
Solid arrows indicate confirmed data. Broken arrows indicate potential interspecies 
transmission. Black arrows indicate infection in the intermediate animals, yellow arrows 
indicate a mild infection in humans, and red arrows indicate a severe infection in humans 
or animals. 


Furthermore, WIV16, the closest relative to SARS-CoV 
found in bats, likely arose through recombination of 
two other prevalent bat SARSr-CoV strains 20 . The most 
frequent recombination breakpoints are within the S 
gene, which encodes the spike (S) protein that contains 
the receptor-binding domain (RBD), and upstream of 
or/8, which encodes an accessory protein °’ 58,59 . Given 
the prevalence and great genetic diversity of bat SARSr- 
CoVs, their close coexistence and the frequent recom¬ 
bination of the coronaviruses, it is expected that novel 
variants will emerge in the future 50,61 . Because there were 
no SARS cases in Yunnan province during the SARS 
outbreak, we hypothesize that the direct progenitor of 
SARS-CoV was produced by recombination within 
bats and then transmitted to farmed civets or another 
mammal, which then transmitted the virus to civets by 
faecal-oral transmission. When the virus-infected civets 
were transported to Guangdong market, the virus spread 
in market civets and acquired further mutations before 
spillover to humans. 

Variability of SARS-CoV in humans and civets 

The genome sequences of SARS-Co Vs from market civ¬ 
ets are almost identical to the genomes of human SARS- 
CoVs 2 ’ 62 . However, two genes show major variation. The 
first variable region is located in the S gene. The SARS- 
CoV S protein is functionally divided into two subunits, 
denoted SI and S2, which are responsible for receptor 
binding and fusion with the cellular membrane, respec¬ 
tively . SI is further divided into the amino-terminal 
domain (Sl-NTD) and the carboxy-terminal domain 
(Sl-CTD). The Sl-CTD functions as the RBD and is 
responsible for binding ACE2 and entering cells ,63,64 . 
Two amino acid residues in the RBD, 479 and 487, were 
identified to be essential for ACE2-mediated SARS-CoV 
infection and critical for virus transmission from civets 
to humans 6 7; . 

The second major location of variation is the acces¬ 
sory gene or/8 (FIG. 5 ). On the basis of SARS spread, 
the SARS 2002-2003 outbreak could be divided into 
three phases, with the early phase characterized by a 
limited number of localized cases, followed by a mid¬ 
dle phase during which a superspreader event occurred 
in a hospital and finally the late phase of international 
spread . The viral genomes from early-phase patients 
contain two genotypes of or/8, one with a complete 
or/8 (369 nucleotides) and the other containing an 
82-nucleotide deletion. By contrast, viral genomes 
from late-phase patients and most of the genomes from 
middle-phase patients contain a split or/8 ( orf8a and 
orf8b) owing to a 29-nucleotide deletion; two excep¬ 
tions were found in middle-phase genomes, one con¬ 
taining an 82-nucleotide deletion in or/8 and the other 
with the whole or/8 deleted. The human isolates from 
2004 and all civet SARS-CoV genomes have a com¬ 
plete or/8 except one civet strain with an 82-nucleotide 
deletion 2 . These data indicate that or/8 genes under¬ 
went adaptations during transmission from animals 
to humans during the SARS epidemic. A limited func¬ 
tional analysis suggested that the ORF8a protein is dis¬ 
pensable for SARS-CoV replication in Vero E6 cells but 
may have a role in modulating endoplasmic reticulum 
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Fig. 3 | Phylogenetic relationships in the Coronavirinae subfamily. The highly human-pathogenic coronaviruses belong 
to the subfamily Coronavirinae from the family Coronaviridae. The viruses in this subfamily group into four genera 
(prototype or representative strains shown): Alphacoronavirus (purple), Betacoronavirus (pink), Gammacoronavirus (green) 
and Deltacoronavirus (blue). Classic subgroup clusters are labelled la and lb for the alphacoronaviruses and 2a-2d for the 
betacoronaviruses. The tree is based on published trees of Coronavirinae 5,136 and reconstructed with seguences of the 
complete RNA-dependent RNA polymerase-coding region of the representative coronaviruses (maximum likelihood 
method undertheCTR + l + T model of nucleotide substitution as implemented in PhyML, version 3.1 (REF. )). Only nodes 
with bootstrap support above 70% are shown. IBV, infectious bronchitis virus; MERS-CoV, Middle East respiratory 
syndrome coronavirus; MHV, mouse hepatitis virus; PEDV, porcine enteric diarrhoea virus; SARS-CoV, severe acute 
respiratory syndrome coronavirus; SARSr-CoV, SARS-related coronavirus. 


stress, inducing apoptosis and inhibiting interferon 
responses in host cells 0,65_69 . Whether and how these 
adaptations were involved in SARS-CoV virulence are 
not fully clarified. 

Variability of bat SARSr-CoVs 

SARS-CoVs and bat SARSr-CoVs mainly vary in three 
regions: S, ORF8 and ORF3 (FIG. 5 ). Bat SARSr-CoVs 
share high sequence identity with SARS-CoV in the S2 
region but are highly variable in the SI region. Compared 
with human and civet SARS-CoV, bat SARSr-CoV 


SI can be divided into two clades: clade 1, which is 
found only in Yunnan province, has the same size 
S protein as human and civet isolates 8 20,5 , whereas 
clade 2, which is found in many locations, has a shorter 
size S protein owing to deletions of 5, 12 or 13 amino 
acids in length 5 > 43 - 45 ’ 48 50 . Among the sequenced bat 
SARSr-CoVs, those with deletions in their RBDs show 
78.2-80.2% amino acid sequence identity with SARS- 
CoV in the S protein, whereas those without deletions 
are much more closely related to SARS-CoV, with 
90.0-97.2% amino acid sequence identity. 
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The second variable region is located in ORF8. 
Most of bat SARSr-CoVs retain an intact orf8 (366 or 
369 nucleotides) and share 47.7-100% sequence identity 
among themselves and 50.6-98.4% with SARS-CoV in 
civets and early-phase patients. A split orf8 (364 nucleo¬ 
tides) owing to a 5-nucleotide deletion was found in 
one bat SARSr-CoV strain, similar to that of SARS- 
CoVs from middle-phase and late-phase patients °. 
The European bat SARSr-CoV has completely lost orf8 
(REF 45 ). These data show that the orf8 genes in bat SARSr- 
CoVs are constantly evolving in their natural reservoirs. 
Considering the variability of orf8 in bats, civets and 
humans, investigating the function of orf8 is a priority, 
particularly the contribution of these different variants 
to viral pathogenesis. 

The third variable region is in ORF3. The SARS-CoV 
genome encodes a 154-amino acid ORF3b, which is an 
interferon antagonist. Bat SARSr-CoVs and SARS-CoV 
are highly similar in ORF3a (96.4-98.9% amino acid 
identity), but bat SARSr-CoVs have different sizes of 
ORF3b (54-154 amino acids) (a large part of the region 
encoding ORF3b overlaps with the ORF3a coding 
region) v . ORF3b retains the anti-interferon function 
in some bat SARSr-CoVs but has lost the function in 
other bat SARSr-CoVs 70 . 

A novel accessory gene, named orfx and located 
between or/6 and or/7, was identified in the genomes 
of several bat SARSr-CoVs from Yunnan province 8-20 
(FIG. 5 ). A preliminary study indicated that ORFX is 
involved in an anti-interferon response . 

Receptor usage of SARS-CoV and SARSr-CoV 

ACE2 binding is a critical determinant for the host 
range of SARS-CoV * 3 . Electron microscopic studies 
have shown that the SARS-CoV S protein forms a clo¬ 
ver shaped trimer, with three SI heads and a trimeric S2 
stalk 74,75 . The RBD is located on the tip of each SI head. 
The RBD binds to the outer surface of ACE2, away from 
its zinc-chelating enzymatic site 7,141 (FIG. 6a). Different 
SARS-CoV strains isolated from several hosts vary in 
their binding affinities for human ACE2 and conse¬ 
quently in their infectivity of human cells 5,78 (FIG. 6b). 
The epidemic strain hTor02 was isolated from humans 
during the late phase of the outbreak in 2002-2003. It 
has a high affinity for human ACE2 and high infectiv¬ 
ity in human cells, and consequently, it was transmitted 
efficiently between humans 2 . Strains cSz02 and cHb05 
were isolated from palm civets in 2002-2003 and 2005, 
respectively. Both have low affinity for human ACE2 
and low infectivity in human cells but have high affin¬ 
ity for civet ACE2 and high infectivity in civet cells 12,79 . 
Strain hcGd03 was isolated from both humans and 
palm civets in 2003-2004 and has moderate affinity for 
human ACE2 and moderate infectivity in human cells; it 
infected humans but did not transmit between humans °. 
Strain hHae08 was isolated from human cell culture 
and has high affinity for human ACE2 and high infectiv¬ 
ity in human cells . Understanding the molecular basis 
for human receptor usage by different SARS-CoV strains 
is crucial for understanding the cross-species transmis¬ 
sion of SARS-CoV and for epidemiological monitoring 
of potential future outbreaks. 


SARS-CoV mutations that affect human and civet recep¬ 
tor binding. Crystal structures of the SARS-CoV RBD 
complexed with human ACE2 revealed that the SARS- 
CoV RBD contains a core structure and a receptor-binding 

a SARSr-CoVs 

YN BG 



b MERSr-CoVs B2 



L2 

Fig. 4 | Phylogenetic analysis of SARSr-CoVs and MERSr- 
CoVs. a | The figure shows a simplified phylogenetic tree of 
severe acute respiratory syndrome-related coronaviruses 
(SARSr-CoVs) from bats. SARSr-CoVs cluster into three 
lineages, L1-L3, and human severe acute respiratory 
syndrome coronaviruses (SARS-CoVs) embed in LI. 

Two individual SARSr-CoVs do not cluster into these 
lineages: YN, a virus isolated from Yunnan province, China, 
and BG, a virus from Bulgaria, Europe. The tree is based on 
published trees 0,13 and reconstructed using sequences of 
the complete RNA-dependent RNA polymerase-coding 
region (maximum likelihood method under the GTR +1+ T 
model of nucleotide substitution as implemented in 
PhyML, version 3.1 (REF. 1 )).The strain Zhejiang2013 
(GenBank No. KF636752) was used as a root, b | By contrast, 
Middle East respiratory syndrome-related coronaviruses 
(MERSr-CoVs) form two major viral lineages, Ll and 
L2. Ll is found in humans and camels, and L2 is found only 
in camels. Two small clusters, B1 (bat 1) and B2, and one 
single virus, SA, from South Africa, were found in bats. 

The phylogenetic tree of MERSr-CoVs is based on a 
published trees 3413 and reconstructed using full-genome 
alignment of all coding regions using the same method as 
above. HKU4-1 (EF065505) and HKU5-1 (EF065509), two 2c 
betacoronaviruses, served as the root of the tree. Detailed 
phylogenetic trees and grouping information can be found 
in Supplementary Fig. Si. MERS-CoV, Middle East 
respiratory syndrome coronavirus. 
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Fig. 5 | Variable regions in different SARS-CoV and bat SARSr-CoV isolates. 

Variability and thus species adaptation majorly affect three severe acute respiratory 
syndrome coronavirus (SARS-CoV) and SARS-related coronavirus (SARSr-CoV) proteins: 
the spike protein (S) (both the Si amino-terminal domain (Sl-NTD) and the Si receptor¬ 
binding domain (Sl-RBD) show variability), ORF3 (3a and 3b) and ORF8 (8a and b). SARS- 
CoV CD02 and hTor02 represent strains that were isolated from patients during the 
early, and middle or late phase of the SARS epidemic in 2002-2003, respectively; SARS- 
CoV CZ3 is a representative of strains isolated from civets in 2003 and 2004 (REFS 42,62 ). All 
bat SARSr-CoVs, except HKU3 and Rp3, were discovered in Yunnan province during 
2011-2015. On the basis of deletions in the RBD, bat SARSr-CoVs can be divided into 
two clades. Those without a deletion and thus an identical size in Si to SARS-CoV can be 
further divided into four genotypes: genotype 1, represented by WIV16, is highly similar 
to SARS-CoV in both the NTD and the RBD; genotype 2, represented by WIV1, differs in 
NTD from SARS-CoV; genotype 3, represented by Rs4231, differs in RBD from SARS-CoV; 
and genotype 4, represented by SHC014 and Rs4084, differs in both NTD and RBD from 
SARS-CoV . The differences in S influence species-specific receptor binding, whereas 
differences in the accessory proteins, including potentially the newly discovered ORFX 
(X), mainly affect immune responses and viral immune evasion. Adapted from REF. 20 , 

CC BY 4.0 (https://creativecommons.Org/licenses/by/4.0/). 
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motif (RBM) 2,141 (FIG. 6a). Two virus-binding hot spots 
have been identified at the interface of the RBD and 
human ACE2, centring on ACE2 residues Lys31 (hot 
spot 31) and Lys353 (hot spot 353) i3,84 (FIG. 6b). They both 
consist of a salt bridge (between Lys31 and Glu35 for hot 
spot 31 and between Lys353 and Asp38 for hot spot 353); 
both salt bridges are buried in hydrophobic pockets and 
contribute a substantial amount of energy to RBD-ACE2 
binding as well as filling voids at the RBD-ACE2 interface. 
Naturally selected RBM mutations all interact with the 
hot spots (FIG. 6b; TABLE 1 ) and affect RBD-ACE2 binding. 


Mutations in RBM residue 479 had an important 
role in the civet-to-human transmission of SARS- 
CoV 2,76,78,8S . Residue 479 is an asparagine in strains 
hTor02, hcGd03 and hHae08 but is a lysine in strain 
cSz02 and an arginine in strain cHb05 (TABLE 1 ). Asn479 is 
located near hot spot 31, without interfering with the 
structure of hot spot 31 (REF. ) (FIG. 6b, c). However, a 
change to Lys479 leads to steric and electrostatic inter¬ 
ference with hot spot 31, reducing the binding affinity 
between the SARS-CoV RBD and human ACE2. By 
contrast, Arg479 reaches the vicinity of hot spot 353 
and forms a salt bridge with ACE2 residue Asp38 (REF ) 
(FIG. 6d). Hence, strains hTor02, hcGd03 and hHae08 
(all of which contain Asn479) and strain cHb05 (which 
contains Arg479) recognize human ACE2 and infect 
human cells efficiently, whereas strain cSz02 (which con¬ 
tains Lys479) recognizes human ACE2 inefficiently and 
infects human cells inefficiently. The above structural 
analyses are supported by biochemical, functional and 
epidemiological data 2,76,78,83_85 . Because of residue dif¬ 
ferences between human ACE2 and civet ACE2, both 
Asn479 and Lys479 fit well into the interface between 
the RBD and civet ACE2, although Arg479 fits even 
better 83,85 ; consequently, strains hTor02, cSz02, hcGd03 
and cHb05 (which contain either Asn479, Lys479 or 
Arg479) recognize civet ACE2 and infect civet cells effi¬ 
ciently 9 . In sum, Asn479 and Arg479 are viral adapta¬ 
tions to human ACE2, whereas Lys479 is incompatible 
with human ACE2; Arg479 is a viral adaptation to civet 
ACE2, whereas Asn479 and Lys479 are also compatible 
with civet ACE2. 

Mutations in RBM residue 487 had an important role 
in the human-to-human transmission of SARS-CoV. 
Residue 487 is a threonine in strain hTor02 but is a ser¬ 
ine in the other strains isolated from humans and civets. 
The methyl group of Thr487 interacts with hot spot 353 
in human ACE2 by providing stacking support for the 
formation of the salt bridge between Lys353 and Asp38; 
consequently, strain hTor02 recognizes human ACE2 
efficiently and was transmitted between humans dur¬ 
ing the 2002-2003 SARS epidemic. By contrast, Ser487 
cannot provide support to hot spot 353, and hence the 
other strains isolated from humans and civets recognize 
human ACE2 inefficiently. Consequently, neither cSz02 
nor hcGd03 was transmitted between humans. The 
above structural analyses are supported by biochemical, 
functional and epidemiological data 2 76 7; > 83 - 8s . Because 
of residue differences between human ACE2 and civet 
ACE2, Ser487 fits well into the RBD-civet ACE2 interface 
although still not as well as Thr487 (REFS 83,85 ); consequently, 
strains sSZ02, hcGd03 and cHb05 (which contain Ser487) 
recognize civet ACE2 and infect civet cells efficiently ’. In 
sum, Thr487 is a viral adaptation to both human and civet 
ACE2, and Ser487 is much more compatible with 
civet ACE2 than with human ACE2 (FIG. 6b). 

RBM residues 442, 472 and 480 also contribute to 
receptor recognition and host range of SARS-CoV 
although not as much as residues 479 and 487. Detailed 
structural, biochemical and functional analyses showed 
that Phe442, Phe472 and Asp480 are viral adaptations 
to human ACE2, whereas Tyr442, Leu472 or Pro472, 
and Gly480 are viral adaptations to civet ACE2 (REFS 2,8 ). 
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To corroborate the importance of these residues for 
SARS-CoV binding to either human or civet ACE2, 
two SARS-CoV S proteins, hOptimize and cOptimize, 
were rationally designed: the former contains all of 
the human ACE2-adapted residues (Phe442, Phe472, 
Asn479, Asp480 and Thr487), whereas the latter contains 


the civet ACE2-adapted residues (Tyr442, Pro472, 
Arg479, Gly480 and Thr487). These two S proteins 
demonstrate exceptionally high affinity for human ACE2 
and civet ACE2, confirming that the human ACE2- 
adapted and civet ACE2-adapted RBM residues help 
determine SARS-CoV host range 8 . In addition to 
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Fig. 6 | Receptor recognition by SARS-CoV and MERS-CoV. a | Severe 
acute respiratory syndrome coronavirus (SARS-CoV) uses its receptor¬ 
binding domain (RBD) (as shown in the structure of strain hTor02, containing 
core structure (cyan) and receptor-binding motif (RBM; magenta)) to bind 
human angiotensin-converting enzyme 2 (ACE2; green; Protein Data Bank 
ID; 2AJF). ACE2 is a peptidase with zinc (blue) in its active centre, b | Several 
residues in the host and viral receptor, as well as two salt bridges that 
stabilize the structure (dotted lines) and form two binding hot spots, are 
crucial for binding of the severe acute respiratory syndrome (SARS) 
epidemic strain hTor02. Hydrophobic residues surrounding the two salt 
bridges are present in the structure but are not shown in the figure, c | By 
contrast, the SARS-related coronavirus (SARSr-CoV) strain bWIVl, which 
was isolated from bats and can infect both civet and human cells, differs in 
residues 442, 472 and 487. The mutation from threonine to asparagine 
in residue 487 introduces a polar side chain and is predicted to interfere 
with binding at hot spot 353. The model shown here was built on the basis 
of the structure of hTor02 RBD complexed with human ACE2 (Protein Data 


Bank ID; 2AJF), in which residues 442,472 and 487 were mutated from those 
in strain hTor02 to those in strain bWIVl. d | The bat SARSr-CoV strain 
bRsSHC014 can also infect human and civet cells; it carries an alanine in 
position 487, and the short side chain of this residue does not support the 
structure of hot spot 353. The model was built on the basis of the structure 
of cOptimize RBD complexed with human ACE2 (Protein Data Bank ID; 
3SCJ), in which residues 442,480 and 487 were mutated from those in strain 
cOptimize to those in strain bWIVl. e | The Middle East respiratory 
syndrome coronavirus (MERS-CoV) RBD (core structure in cyan and RBM in 
magenta) binds human dipeptidyl peptidase 4 (DPP4; green; Protein Data 
Bank ID: 4KR0). Structure figures were made using PyMOL 5 . Modelled 
mutations in panels c and d were performed using Coot °. Panels a-d are 
adapted from REF A this research was originally published in The Journal of 
Biological Chemistry. Wu, K. L., Peng, G. Q., Wilken, M., Ceraghty, R. J. b Li, F. 
Mechanisms of host receptor adaptation by severe acute respiratory 
syndrome coronavirus.]. Biol. Chem. 2012; 287:8904-8911. © American 
Society for Biochemistry and Molecular Biology. 
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Table 11 Mutations in the receptor-binding motif of SARS-CoV 


Strain 

Host 

Year 

Truncation 
in receptor¬ 
binding 
motif 

Residue 

442 

Residue 

472 

Residue 

479 

Residue 

480 

Residue 

487 

hTor02 

Human 

2002-2003 

No 

Tyr 

Leu 

Asn 

Asp 

Thr 

cSz02 

Civet 

2002-2003 

No 

Tyr 

Leu 

Lys 

Asp 

Ser 

hcGd03 

Human and civet 

2003-2004 

No 

Tyr 

Pro 

Asn 

Cly 

Ser 

cHb05 

Civet 

2005 

No 

Tyr 

Pro 

Arg 

Cly 

Ser 

hHae08 

Human 

2008 

No 

Phe 

Phe 

Asn 

Asp 

Ser 

hOptimize 

Human 

In vitro design 

No 

Phe 

Phe 

Asn 

Asp 

Ser 

cOptimize 

Civet 

In vitro design 

No 

Tyr 

Pro 

Arg 

Cly 

Thr 

bHKU3 

Bat 

2005 

Yes 

Ser 

Cly 

Ser 

Thr 

Val 

bWIVl 

Bat 

2013 

No 

Ser 

Phe 

Asn 

Asp 

Asn 

bRsSHC014 

Bat 

2013 

No 

Trp 

Pro 

Arg 

Pro 

Ala 


Data from Hu et al. 2012 (REF. 83 ). SARS-CoV, severe acute respiratory syndrome coronavirus. 


receptor binding, proteolytic cleavage of S and potentially 
other mutations that affect virion and trimer stability may 
also be important for vims transmissibility in different 
hosts, and these factors need to be studied further. 

SARSr-CoV mutations that affect receptor binding. 

To date, numerous SARSr-CoV strains have been iden¬ 
tified from bals 5,1 ’ 8 2 . These bat SARSr-CoVs are the 
likely progenitors of SARS-CoV that infected humans 
and civets, and hence understanding their interactions 
with human or civet ACE2 is critical for tracing the ori¬ 
gins of SARS-CoV and for preventing and controlling 
future SARS-CoV outbreaks in humans. The RBD 
sequences of these bat SARSr-CoVs fall into three major 
groups; the representative strains from each group are 
bHKU3 (isolated in 2005), bWIVl (isolated in 2013) 
and bRsSHC014 (isolated in 2013) (TABLE 1). Strains 
bWIV 1 and bRsSHC014, but not strain bHKU3, use both 
human and civet ACE2 and hence can infect both human 
and civet cells 16,18-20,86,8 . Strain bHKU3 has a truncated 
RBM (TABLE 1 ), which distorts the structure of the RBM 
and abolishes its binding to human and civet ACE2. 
Neither strain bWIVl nor strain bRsSHC014 contains 
truncations in its RBM, and hence, their RBMs likely 
retain the same structure as SARS-CoV RBMs. Here, we 
analysed the potential interactions between these two 
strains (bWIVl and bRsSHC014) and human ACE2 
by building homology structural models of their RBDs 
complexed with human ACE2, focusing on residues 479 
and 487 (FIG. 6c, d). Strain bWIVl contains Asn479 and 
Asn487 in its RBM. Whereas Asn479 is a viral adaptation 
to human ACE2, the polar side chain of Asn487 may have 
unfavourable interactions with the aliphatic portion of res¬ 
idue Lys353 in human ACE2, which is part of hot spot 353 
(FIG. 6c). Strain bRsSHC014 contains Arg479 and Ala487 in 
its RBM. Whereas Arg479 is a viral adaptation to human 
ACE2, the small side chain of Ala487 does not provide 
support to the structure of hot spot 353 (FIG. 6d). Therefore, 
although both bWIV 1 and bRsSHC014 can infect human 
airway cells, they bind human ACE2 less well than hTor02 
and produce less severe symptoms than the epidemic 
strain of SARS-CoV in vivo ;8,89 . Similarly, both bWIVl 


and bRsSHC014 can infect civet cells, but they bind civet 
ACE2 less well than cSz02. Thus, it is predicted that both 
strains will be attenuated compared with early-phase or 
late-phase human SARS epidemic viruses. Future evolu¬ 
tion of bat SARSr-CoV strains bWIVl and bRsSHC014 
in crucial RBM residues may allow them to cross the 
species barriers between bats, civets and humans, posing 
potential health threats. 

Origin and evolution of MERS-CoV 

Whereas the emergence of SARS involved palm civ¬ 
ets, most of the early MERS index cases had contact 
with dromedary camels. Indeed, MERS-CoV strains 
isolated from camels were almost identical to those iso¬ 
lated from humans 3-95 . Moreover, MERS-CoV-specific 
antibodies were highly prevalent in camels from the 
Middle East, Africa and Asia 3,14,96-1 ° 3 . MERS-CoV infec¬ 
tions were detected in camel serum samples collected in 
1983 (REF. 100 ), suggesting that MERS-CoV was present 
in camels at least 30 years ago. Genomic sequence analy¬ 
sis indicated that MERS-CoV, Tylonycteris bat corona- 
virus HKU4 and Pipistrellus bat coronavirus HKU5 are 
phylogenetically related (denoted as betacoronavirus 
lineage C) The viruses in this lineage have identi¬ 
cal genomic structures and are highly conserved in 
their polyproteins and most structural proteins, but their 
S proteins and accessory proteins are highly variable. 
MERSr-CoVs were found in at least 14 bat species from 
two bat families, Vespertilionidae and Nycteridae. 
However, none of these MERSr-CoVs is a direct progen¬ 
itor of MERS-CoV, as their S proteins differ substantially 
from that of MERS-CoV 98,104-106 . 

To understand the evolutionary relationships 
between MERS-CoV and MERSr-CoVs, we constructed 
a phylogenetic tree on the basis of the alignment of 
all the coding regions (FIG. 4b; Supplementary Fig. Sib). 
The phylogenetic tree contains two main clusters and 
several small clades or strains. Overall, the genetic diver¬ 
sity within the LI and L2 viral lineages is low, indicating 
that humans and camels have been infected by viruses 
from the same source within a short time period. The LI 
viruses include human and camel MERS-Co Vs mainly 


www.nature.com/nrmlcro 









REVIEWS 


from the Middle East (the United Arab Emirates, the 
Kingdom of Saudi Arabia, Oman and Jordan) and two 
Asian countries (South Korea and Thailand) that had 
caused outbreaks in human populations. It is worth not¬ 
ing that the cases reported in South Korea and Thailand 
were related to those in the Middle East. The L2 viruses 
include camel MERS-CoVs from Africa (Nigeria, 
Burkina Faso and Ethiopia) and one Middle East 
country (Morocco); these viruses have not caused any 
human infection. Clearly, these two viral lineages share 
a common ancestor but have diverged in their poten¬ 
tial to cause human infections. The MERSr-CoV strain 
Neoromicia/5038 (GenBankNo. MF593268) isolated in 
South Africa was the closest relative to MERS-CoVs in 
the phylogenetic tree. Overall, all the MERSr-CoVs iso¬ 
lated from bats support the hypothesis that MERS-CoV 
originated from bats. However, given the phylogenetic 
gap between the bat MERSr-CoVs and human and camel 
MERS-CoVs, there should be other yet-to-be-identified 
viruses that are circulating in nature and directly con¬ 
tributed to the emergence of MERS-CoV in humans 
and camels. Hopefully, such viruses will be found in bats 
in the future. 

Not surprisingly, recombination events have taken 
place in the evolution and emergence of MERS- 
CoV 4,105 ’ 107_109 . Phylogenetic trees constructed using 
genes encoding orflab and S were incongruent with the 
tree topology of the complete genome, suggesting poten¬ 
tial recombination in these genes 8 . Numerous recom¬ 
binations imply that MERS-CoV originated from the 
exchange of genetic elements between different viral 
ancestors, including those isolated from camels and the 
assumed natural host bats 94,105,107,110,111 . 

Variability of human and camel MERS-CoV 

The full-length genomic sequences of MERS-CoVs 
isolated from humans and camels are almost identi¬ 
cal (>99% identity). The major variations are located 
in S, ORF4b and ORF3, particularly in African camel 
MERS-CoVs . Substitutions of a few amino acid resi¬ 
dues were found in the S protein of some camel MERS- 
CoVs, but none of them was located in the RBD 4,112 . 
Neutralization assays indicated that camel sera that are 
positive for MERS-CoV can completely neutralize the 
human MERS-CoV strains, suggesting that MERS-CoVs 
isolated from humans and camels are antigenically simi¬ 
lar to each other 94 . MERS-CoVs from both humans and 
camels contain variable ORF3 and ORF4 proteins with 
different lengths owing to either terminal truncations or 
internal deletions ’ 4 . ORF4b is known to be an interferon 
antagonist ,114 . MERS-CoV isolates from West African 
camels with a truncated ORF4b gene replicate less effi¬ 
ciently in human cell culture and are less pathogenic in 
human DPP4 transgenic mice 4 . Curiously, deletion of 
the orf4 gene in the human MERS-CoV strain EMC did 
not substantially reduce virus replication, although it 
induced a stronger interferon response 4 . Another study 
demonstrated that the deletion of orf3-orf5 dramatically 
attenuated MERS-CoV virulence, primarily through 
increased host responses, including disrupted cellular 
processes, increased activation of the interferon pathway 
and robust inflammation 5 . 


Variability of bat MERSr-CoVs 

To date, bat MERSr-CoVs and human and camel MERS- 
CoVs share the same genomic structures but differ sub¬ 
stantially in their genomic sequences 105,106,110,111 ’ 116 . The 
highest overall genomic sequence identity between 
bat MERSr-CoV and human and camel MERS-CoV 
is -85%. On the basis of their genomic sequences, sev¬ 
eral bat MERSr-CoV strains discovered in China, such 
as Ii-MERSr-CoV, Ve-MERSr-CoV and Hy-MERSr- 
CoV, have just reached the taxonomic threshold to be 
considered the same species as MERS-CoV 06,110,ni . 

Compared with human and camel MERS-CoV, 
bat MERSr-CoVs vary most in S and accessory genes. 
The sequence identity of the S protein between bat 
MERSr-CoVs and human and camel MERS-CoVs is 
approximately 45-65%, with even lower sequence iden¬ 
tity in the RBD region 0,1U . The size of these S proteins 
differs in these strains, mainly because of deletions in 
their RBD region and/or the SI and S2 boundary. These 
deletions are considered to be related to the differences 
in receptor binding and cell entry L,U6 . The acces¬ 
sory genes, including those encoding ORF3, ORF4a, 
ORF4b and ORF5, are also highly variable in length and 
sequence between bat MERSr-CoVs and human 
and camel MERS-CoVs, suggesting substantial evolution 
of these genes in their natural hosts 5,106,110,111,116 . 

Receptor usage of MERS-CoV and MERSr-CoV 

In contrast to SARS-CoV, which uses ACE2 as its receptor, 
MERS-CoV uses DPP4. Similar to SARS-CoV Sl-CTD, 
MERS-CoV Sl-CTD functions as the viral RBD 0,11 . 
Like the SARS-CoV Sl-CTD, the MERS-CoV Sl-CTD 
also contains two subdomains, a core structure and an 
RBM 118121 (FIG. 6e). The core structures of these two 
Sl-CTDs are similar to each other, with both containing a 
five-stranded (3-sheet as the main scaffold. However, their 
RBMs differ substantially: whereas the SARS-CoV RBM 
mainly contains loops, the MERS-CoV RBM mainly con¬ 
tains a four-stranded (3-sheet. The structural differences 
between MERS-CoV and SARS-CoV RBMs account for 
the different receptor specificities of the two viruses 21 . 

Like the interactions between SARS-CoV and ACE2, 
the interactions between MERS-CoV and DPP4 have 
been extensively examined. DPP4 from humans, camels, 
horses and bats can function as a receptor for MERS- 
CoV, whereas DPP4 from mice, hamsters and ferrets 
cannot 12 > 122 125 . Key residue differences between human 
DPP4 and the DPP4 from other species affect the species 
specificities of MERS-CoV. For example, two residues 
(288 and 330) in mouse DPP4 and five residues (291,295, 
336, 341 and 346) in hamster DPP4 are largely responsi¬ 
ble for the incompatibility of mouse and hamster DPP4s 
with MERS-CoV A Mutating these residues to the 
corresponding residues in human DPP4 makes mouse 
and hamster DPP4 functional receptors for MERS-CoV. 
On the other hand, MERS-CoV and MERSr-CoVs have 
been isolated from camels and bats, respectively. MERS- 
CoV strains isolated from humans and camels are highly 
similar to each other, and they both use human DPP4 effi¬ 
ciently 2 . MERSr-CoVs from bats in general share only 
-60-70% sequence identity with MERS-CoV in the RBD, 
and only some of these bat viruses, including HKU4, 
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recognize DPP4 as the receptor 110,1 11,126 . However, they 
bind DPP4 less efficiently than MERS-CoV. Mutating 
three residues in the HKU4 RBD (540,547 and 558) sub¬ 
stantially increased its affinity for human DPP4 (REF ). 
Overall, as in the case of SARS-CoV, receptor recognition 
is a crucial determinant of the host range of MERS-CoV. 

SADS-CoV 

From 28 October 2016 to 2 May 2017, swine acute diar¬ 
rhoea syndrome (SADS) was observed in four pig breed¬ 
ing farms in Guangdong province, with a mortality up to 
90% for piglets 5 days or younger. A novel HKU2-related 
bat coronavirus, named SADS-CoV, was identified 
as the causative agent 4 128 129 . The SADS-CoV isolates 
from piglets of the four farms were almost identical and 
shared 95% identity with Rhinolophus bat coronavirus 
HKU2 (REF. 130 ), indicating the bat origin of this pig virus. 
Immediately after the SADS outbreak, SADS-related 
CoVs (SADSr-CoVs) with 96-98% sequence identity 
to SADS-CoV were detected in 9.8% of anal swabs col¬ 
lected from different Rhinolophus species in Guangdong 
province during 2013-2016. Although genetically highly 
similar, bat SADSr-CoVs show high genetic diversity in 
the S gene, with 72-92% nucleotide and 80-98% amino 
acid identity to SADS-CoV. Receptor analysis indi¬ 
cated that none of the known coronavirus receptors, 
ACE2, DPP4 and aminopeptidase N, are essential for 
SADS-CoV entry . The mechanism of transmission 
of SADS-CoV from bats to pigs and the pathogenesis of 
bat-originated SADSr-CoVs in pigs need further explor¬ 
ation. This is the first documented spillover of a bat 
coronavirus that caused severe diseases in domestic 
animals, although molecular evolution data suggested 
PEDV probably originated in bats * . 

Conclusions and future perspectives 

The collected data on genetic evolution, receptor bind¬ 
ing and pathogenesis demonstrated that SARS-CoV 
most likely originated in bats through sequential recom¬ 
bination of bat SARSr-CoVs. Recombination likely 
occurred in bats before SARS-CoV was introduced 
into Guangdong province through infected civets or 
other infected mammals from Yunnan. The introduced 
SARS-CoV underwent rapid mutations in S and orf8 
and successfully spread in market civets. After several 
independent spillovers to humans, some of the strains 
underwent further mutations in S and became epidemic 
during the SARS outbreak in 2002-2003. However, a 


recent serological investigation revealed the presence 
of antibodies against the SARSr-CoV nucleocapsid in 
humans living around a bat cave but who had not shown 
clinical signs of disease, suggesting that the virus can 
infect humans through frequent contact . 

A similar scenario might have happened for MERS- 
CoV. Since its outbreak in 2012, MERSr-CoVs and related 
viruses (HKU4 and HKU5) have been found in differ¬ 
ent bat species in five continents 7 > 21 > 106 >iio,iii,ii 6 ,i 26 ,i , 2 

The ORFlab of these viruses is highly similar to MERS- 
CoV ORFlab, but they are highly diverse in their S pro¬ 
teins. Surprisingly, some bat MERSr-CoVs and HKU can 
use the same receptor, DPP4, as MERS-CoV 0,111,126,12 . 
Given the massive number of coronaviruses carried by 
different bat species, the high plasticity in receptor usage 
and other features such as adaptive mutation and recom¬ 
bination, frequent interspecies transmission from bats to 
animals and humans is expected. 

Currently, no clinical treatments or prevention strat¬ 
egies are available for any human coronavirus. Given the 
conserved RBDs of SARS-CoV and bat SARSr-CoVs, 
some anti-SARS-CoV strategies in development, such as 
anti-RBD antibodies or RBD-based vaccines, should be 
tested against bat SARSr-CoVs. Recent studies demon¬ 
strated that anti-SARS-CoV strategies worked against 
only WIV1 and not SHC014 (REFS 1,88,89 j. In addition, 
little information is available on HKU3-related strains 
that have much wider geographical distribution and bear 
truncations in their RBD. Similarly, anti-S antibodies 
against MERS-CoV could not protect from infection 
with a pseudovirus bearing the bat MERSr-CoV S n . 
Furthermore, little is known about the replication and 
pathogenesis of these bat viruses. Thus, future work 
should be focused on the biological properties of these 
viruses using virus isolation, reverse genetics and in vitro 
and in vivo infection assays. The resulting data would 
help the prevention and control of emerging SARS-like 
or MERS-like diseases in the future. 

It is widely accepted that many viruses have existed in 
their natural reservoirs for a very long time. The constant 
spillover of viruses from natural hosts to humans and 
other animals is largely due to human activities, includ¬ 
ing modern agricultural practices and urbanization. 
Therefore, the most effective way to prevent viral zoono¬ 
sis is to maintain the barriers between natural reservoirs 
and human society, in mind of the one health’ concept. 
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